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1 . Introduction . 

Let X be an n-dlmensional random variable whose density function 
is a convex combination of normal densities, l.e., 

P(^) - Pj(x) for X 


where 


and 


a® > 
1 


0 , 


® o 


o.T-o 


-1 


PjCx) 


n/2 ♦_ 0 |l /2 






i 
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N-IK. 

a maximum- likelihood 
choice of parameters 
likelihood function 


Is an Independent sample of observations on 

estimate of the parameters 

{a°,p° r?}. , which locally maximizes 

1 X X X*X|*»«|tD 


X, then 

Is a 
,m 

the loR - 


N 

L - P^*k^' 

In which p is evaluated with the true parameters {a°,p° E°}. , replaced 

by the estimate {a ,p Z } (In the following, It Is clear from the 

i ’ 1 ' 1 !■!, . . . ,m * 

context which parameters are used In evaluating the density functions p^ and 
p. Therefore, these parameters are not explicitly pointed out.) 

Clearly, L is a differentiable function of the parameters to be estimated 
Equating to zero the ^-artlal derivatives of L with respect to these parameters, 

one obtains, after a straightforward calculations the following necessary con- 
ditions for a maximum-likelihood estimate: 


(l.a) 


(l.b) 


(l.c) 


“l N 

°*1 N k“l p(5fj^) 

ri ? /rl ? 

^1 N k-l’^k p(Xj^)^/ N k«l 




/(i ? 

p(x^) / N k-1 p(Xj^) 


1*1 , . . . 




These are known as the likelihood equations , and we shall assume that the para- 
meters under consideration here are restricted to aets In which these aquations 
are sufficient, as well as necessary, for a maxlimim-likellhood estimate. 
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The likelihood equations suggest the following Iterative procedure for 
obtaining a solution: Beginning with some set of starting values, obtain 
successive approximations to a solution by Inserting the preceding approximations 
In the expressions on the right-hand sides of (l.a), (l.b), and (l.c). 

This scheme a attractive for Its relative case of implementation, and It has 
been Investigated by a number of authors. Empirical studies of Day [1], Duda 
and Hart [2], and Hasselblad [3] suggest that this scheme Is convergent and 
that convergence Is particularly fast when the component normal densities In p 
are "widely separated" In a certain sense. No proof of convergence Is given in 
these papers, although Peters and Walker [8] have shown that, with probability 

approaching 1 as N approaches Infinity, a related procedure (which includes 

this one as a special case) converges locally *'o the consistent maximum- likelihood 
estimate whenever a certain "step-size" Is sufficiently small. (An Iterative 
procedure Is said to converge locally to a limit If the iterates converge to 
that limit whenever the starting values are sufficiently near that limit.) 

Petfers and Coberly [7] have proved that. If all of the parameters 
and are held fixed, then the Iterative procedure suggested by the equations 

(l.a) alone converges locally to a maxlmum-llkel Ihood estimate of the para- 
meters a^, m. They also report on numerical studies In which the 

computational feasibility of this procedure Is demonstrated. In this note, 
we provide sufficient conditions for the Iterative procedure suggested by the 
equations (l.b) alone, for fixed parameters and to converge locally 

to a maximum— likelihood estimate of the means 1*1 m. These conditions 

are, roughly, that either m ■ 2 or the component normal densities in p be 
"widely separated" In a certain sense. 
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2. Preliminary dlscusBlon . 

We denote by 53T the m-fold direct sum of IR ° with itself, and we re- 
present its « .-ments as columns 


M - 




(Of course, is isomorphic 
parameter sets {a.}. , 

X X^XpasayD 


toIR. 

and 


mn 


.) 




We also find 
X* X I • • • f o 


it convenient to represent 
columns 


/a,\ 


/z,\ 



' 1' 

a 


• 

• 

and £ ~ 

a 

• 


a 

\<^j 




and, in the following, we use the fact that a and £ belong to normcd vcr ^or 
spaces without explicitly introducing these spaces or their norms. 

Setting 




S k-l*k p(Xj^)y S k-l p(Xj^) ’ 


i-1.. 


we define 


M(a,w,i:) 
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which we regard as a function from "01 to Itself depending on parameters a 
and Z, The equations (l.b) can now be .’rltten as 

(2) p - M(a,p,f), 


and the iterative procedure under consideration is the following: Beginning with 
some starting value define successive iterates Inductively by 


(3) 


-(k+1) 




for k-1,2 

In our results concerning the convergence of the procedure (3) , the 
Frechct derivative of M with respect to p, which we denote by V— M, is 
of central importance. (For questions concerning the definition and properties 
of Freeh'- derivatives, see Luenberger [6].) Indeed, if a,p,and E satisfy 
(2) and if ] | | | is any norm on STt , then one can write 

M(a,p',E’) - p - 7-M(a,p,E) (p'-p) + tv||p'- p||^) 

for p' near p. Consequently, if there exists a norm | | | | on 57T with 
respect to which V^(a,p,E) has operator norm less than 1, then M is 
locally contractive in that norm near p, i.e., there is a number X, 0 S X < 1, 
such that 


(4) 


I |M(a,p',E) - p| I s X I |p" - p| 
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whenever p"* is sufficiently near p. Since an inequality of the form (4) 
Implies the local convergence of the iterative procedure (3) to p, our 
objectives will be met by giving sufficient conditions for V^(a,p,Z) to have 
operator norm less than 1 (with respect to some norm on 537 ) at parameter 
vectors a,p, and Z which satisfy (2). 

A ^ A 

We no>' calculate V-M at a set of parameter vectors ct,p, and E (with 

y\ ^ 

components and i-l,...,m) which satisfy the likelihood equations. 

We first define inner products (K" 

for *»y ^/R”* i“i.»*«.o. 


Then, denoting the Friechet derivative of with respect to Pj by 

one verifies with the aid of the likelihood equations that 


A A A 


1 N Pi^V A Pl^V 

■ N k-1 p(xJ^V^i^^ 


i j 


V M (a.p.E) 


1 N P,(Xj^) ^ 

^ ■ N k-1 p(x^)^V^i^^ 


k 

p'V 


(x^^-Pl),*>i if i ■ J' 


This yields the following expression, in the form of a matrix of Frechet derivatives 
for at a solution of the likelihood equations: 


(5) 


/\ A A 


A ^ ^ ^ A A V 

7 M (a,p,f) V Mj^(a,p,Z)\ 


yt(ci.P,£) 


A. A A A. A A 

M^CO.U.I) ... 


m 


y 


'S 
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I - f— E 
‘ k-1 


• 1 

• 

• 

\ P(Xj^)^*k m7 

• 

\ p(x^) *k ^m^ ’* mj 


}. 


The inner products <*,*>. induce an inner product <*,*> on ^ . In 
the following, | | | | will denote both the vector norm and the operator norm 
defined by this inner produc.. It is apparent from (5) that, at a solution 
of the likelihood equations, is of the form I-Q, where Q is positive 

seml-dt Unite and symmetric with respect to the inner product <*,*>. In fact, 
we prove in an appendix that Q is positive-definite with probability 1 
whenever N i mn. It followp that, with probability 1 for N i mn, |IV-^|| < 1 
at a solution of the likelihood equations if and only if |Iq|| < 2. We con- 
clude these preliminary remarks with the following 

Lemma : | | Q| | < m. 

Proof ; Since Q is symmetric with respect to <*,*>, one has 


\M\- 

IIVIIsl 


If {v.}. , c rn ” is such that 

i 1“1,. . . ,m — 


/v. 


satisfies |Iv|| S 1, then 

• * 

V^m 
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_ _ m m 1 N Pi(V /N 

<v,Qv> - N 

m m , N A. 2 1 N P«(*k^ ^ 2 

^ 1-1 j-l^N k-l'^''i’ p(x^)^\"^i^^i^ ^N k-l^''j’ 

mm 1 ^ ^ A 

J^l j-l^''l’^N k£ ^*k"^‘i^ p(x^) ^i^l 


1 N A at ^—1 1/2 

•^''j-'ii k£i<v^j><V‘‘j> Tfv'S 


«iP.(x) 

since — < 1 for i-l,...,m. From the likelihood equations, one con- 

cludes that 


m 


m 


<v,Qv> < j 


r ^ ^1/2^ ^1/2 , ^ , 

£l<v^.vi>^ <v,.vj>j - (.E,<v 


^iSl 




and the lemma is proved. 


3. Sufficient conditions for local ccnverRcnce. 

Sufficient conditions will now be ^iven for local convergence of the pro- 
cedure (3) to a solution of (2). Our first condition is given by the theorem 
below. 

A A ^ 

Theorem 1; Suppose that m - 2 and N i 2n, and let oi,y,E be vectors of 

parameters which satisfy the likelihood equations. If ot,p, and E, satisfy 

A A A 

(2) and lie sufficiently near a,y, am'; E, then the iterative procedure 

(3) converges locally to y with prcbablllty 1. 
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Proof ; From the preliminary discussion, we know chat the procedure (3) converges 

locally to p if has operator norm less chan 1 wlt.h respect to 

some vector norm on Sfl . Then, since 7-^ depends continuously on a,p, and 

^ ^ - 

I, it suffices to find a norm on 0l( with respect to which V^(a,p,Z} has 

operator norm less than 1 in order to prove the theorem. 

A A ^ 

Now y^(a,u,E) ■ I-Q, where Q is the operator introduced in the pre- 
liminary aiscussion. With probability 1, Q is positive-definite as well as 
symmetric with respect to <*,*>, and, fron the Lemma, ||q|| < m ■ 2. Con- 
sequently, | | (a , p ,*E) j | < 1 with probability 1, and the proof is complete. 

We now define an operator on Vtt by 




JR 


/Pl<x) o 


p(x) 




o 


p_(x) 

< — r^(x-p°),*> 
p(x) m m/ 


p(x)dx. 


In which the true parameters (whose vectors we denote by a ,p , and E ) are 
used in evaluating the functions p^^ and p and the inner products 
The operator Q° can be thought of as an m^m array of operators on ", the 
Ij— operator of which is 



Pl(x) o o 


P(x)dx. 


If the component normal densities in p are "widely separated" in the sense 
that each pair of parameters u" and E" differs greatly from every other 
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pair, then the off-diagonal operators in this array are near zero. On the 
ether hand, regardless of the "separation" of the component densities, the 
diagonal operators define an operator on w which lies strictly between the 
zero operator and the identity operator in the ordering on syunetrlc operators 
defined by the inner product <*,*>. Consequently, if the component normal 
densities in p are sufficiently "widely separated" in this sense, then the 
operator I-Q° has spcctial radius less than 1, and, hence, there exists a 
norm on with respect to which 1-Q° has operator norm less than 1. (See 
Householder [4].) This motivates our second condition. 

Theorem 2 ; Suppose that the component normal densities in p are sufficiently 
"widely separated" that the spectral radius of I-Q° is -less than 1. Then 
the probability is 1 that, for sufficiently large N, there exist neighborhoods 
of and such chat, if ot,^, and Z lie in these neighborhoods 

and satisfy (2), then the iterative procedure (3) converges locally to p. 

Proof : A straightforward calculation and an application of the Strong Law of 

Large Numbers (see Loeve [5]) yields that ,^) converges with 

probability 1 to I-Q° as N approaches infinity. Since 1-Q° is assumed 
to have spectral radius less than 1, it follows that, with probability 1, 
if N is sufficiently large, then V^(a,p,Z) has operator norm less than 1 
with respect to some norm on whenever oi,M, and E lie near a°,P , and 
E? If Vj^(a, p, 0 has operator norm less than 1 and a,p, and E also 
satisfy (2), then the iterative procedure (3) converges locally to p. 

This completes the proof. 
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Appendix 

We now prove that the operator 



/pi^V, A 


N 



0 -^^, 

* 

• 

; 

k-1 

1 p(x^)<w/ 

p_(*t.) / 

V p(v<v‘‘.)'-’./ 


is positive-definite on ^ with probability 1 whenever N i mn. Clearly, 
it suffices to show 'tat the vectors 


•''V = 


p(x.)‘V'P 


P_(Xu) , 


N, 


span 37if with probability 1 whenever N i mn. This '.'olJows from the more 
general result below. 

Lemma. Let „ be an Independent sample of observations on a 

' 1C • • • pW 

random variable x La JP ^ which is distributed with a probability density 
function p. If V is a real-analytic function from |R ® to ^ whose 
component functions are linearly independent, then the vectors V(x^^), 
k>l,...,N, apan JP ^ with probability 1 whenever N 2 t. 
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Proof : Denoting the component function of V by , we define a real* 


analytic function from (Jc * to (R ^ by 


Vj(x) 


/vj(x) 


\ Vj(*)/ 


for j > l,...,t. Our proof of the lennna consists of showing inductively that, 
for j ■ l,...,t, the set {V.(x,)}, , . spans [R ^ with probability 1. 

We make the preliminary observation that, since the real-analytic functions 
Vj are assumed to be linearly independent, ..ny non-zero linear combination 
of them vanishes only on a set of Lebesgue measure zero in (R 

From the observation above, non-zero with probability 1; 

hence srans [P ^ with probability 1. Suppose now that, for some 

J, 1 s J < t, the set , . spans [R ^ with probability 1. 

Then, with probability 1, the set ^k-1 J+1 span 

if and only if 


(*) 




j. 


for some set of constants {c, If (*) holds, the constants c, 

K K*i. , • • • , J X 


'Cl\ 


'H 


'i 


- V ( 




are determined by 


13 


with probability 1, where la the jxj maf .’ix whose 

Is Vj(x^^). Thus, with probability 1, (*) holds 1' and only if 


column 


Now 






Vj+if-'j)/ 


■ ''j+l^*j+l^ " °* 




/''j+l^*l^\ 


Vj^l(x) 


Vv 


J+1 


(Xj)/ 


is a non-zero linear combination of the functions snd, hence, 

vanishes only on a set of Lebesque measure zero in [R One concludes that 
{V, .(x,)}, , fails to span with probability zero. This 

j ** A ic k* X p • • • p j ' X 

compXetes the induction p and the lemma is proved. 


« 
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